Information filtering based on transferring similarity 
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In this Brief Report, we propose a new index of user similarity, namely the transferring similarity, 
which involves all high-order similarities between users. Accordingly, we design a modified collabo- 
rative filtering algorithm, which provides remarkably higher accurate predictions than the standard 
collaborative filtering. More interestingly, we find that the algorithmic performance will approach 
its optimal value when the parameter, contained in the definition of transferring similarity, gets 
close to its critical value, before which the series expansion of transferring similarity is convergent 
and after which it is divergent. Our study is complementary to the one reported in [E. A. Leicht, 
P. Holme, and M. E. J. Newman, Phys. Rev. E 73 026120 (2006)], and is relevant to the missing 
link prediction problem. 
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With the exponential growth of the Internet [l[ and the 
World- Wide- Web |2j, a prominent challenge for modern 
society is the information overload. Since there are enor- 
mous data and sources, people never have time and vigor 
to find out those most relevant for them. A landmark for 
solving this problem is the use of search engine [!, 0|- 
However, a search engine could only find the relevant 
web pages according to the input keywords without tak- 
ing into account the personalization, and thus returns 
the same results regardless of users' habits and tastes. 
Thus far, with the help of Web2.0 techniques, personal- 
ized recommendations become the most promising way 
to efficiently filter out the information overload Q. Mo- 
tivated by the significance in economy and society, de- 
vising efficient and accurate recommendation algorithms 
becomes a joint focus from theoretical studies [5] to e- 
commerce applications Various kinds of algorithms 
have been proposed, such as collaborative filtering (CF) 
content-based methods [tj [lfj, spectral analysis 




principle c omp onent analysis 14] . network- 
based inference [HI, [H, [13, EH , and so on. 

A recommender system consists of users and objects, 
and each user has rated some objects. Denoting the user 
set as U — {ui,u2, ■ ■ ■ ,un} and the object set as O = 
{oi, 02, • • • , Om}, the system can be fully described by an 
NxM rating matrix V, with Vi a ^ denoting the rating 
user Ui gives to object o a . If u, L has not yet evaluated o Q , 
Vi a is set as zero. CF system has been one of the most 
successfully and widest used recommender systems since 
its appearance in the mid-1990s @, @]. Its basic idea is 
that the user will be recommended objects based on the 
weighted combination of similar users' opinions. In the 
standard CF, the predicted rating v' ia from user m to 



object o a is set as: 



(V ja -Vj), 



(1) 



where is the similarity between m and Uj, V, means 
the average rating of u, and / = (E, s y) serves as 
the normalization factor. Here, j runs over all users hav- 
ing rated object o a excluding Uj himself. The similarity, 
s^, plays a crucial role in determining the algorithmic 
accuracy. In the implementation, the similarity between 
every pair of users is calculated firstly, and then the pre- 
dict ratings by Eq. ([1]). Various similarity measures has 
been proposed, among which the Pearson correlation co- 
efficient is the widest used ||7l, as: 



EcOic -Vi){v jc -Vj) 



(2) 
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where c, a and (3 run over all the objects commonly se- 
lected by user i and j. All diagonal elements in the sim- 
ilarity matrix are set to be zero. 

Several algorithms [l9|, H(J, HH have recently been pro- 
posed to improve the accuracy of the standard CF via 
modifying the definition of user-user similarity. How- 
ever, all those algorithms have not fully addressed the 
similarity induced by indirect relationship, say, the high- 
order correlations. Note that, the Pearson correlation 
coefficient, s^-, considers only the direct correlation. We 
argue that to appropriately measure the similarities be- 
tween users, the indirect correlations should also be taken 
into consideration. To make our idea clearer, we draw an 
illustration in Fig. [TJ Suppose there are three users, la- 
beled as A, B and C . Although the similarity between 
user A and C is quite small, A and C are both very simi- 
lar with B. Actually, A, B and C may share very similar 
tastes, and the very small similarity between A and C 
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FIG. 1: Illustration for transferring similarity. 



TABLE I: The optimal and maximal values of e for the three 
cases corresponding to Figs. 2-4. e max is obtained by averag- 
ing 20 independent runs, and we have checked that in each 
run £ op t is always a little bit smaller than e max . The resolution 
of e is 10 -3 since for higher resolution (e.g., 10 -4 ), the differ- 
ence between two neighboring data point is very small, and 
the optimal value is not distinguishable with the presence of 
fluctuations. 

Data Divisions 90% vs. 10% 50% vs. 50% 10% vs. 90% 



£opt 
£max 



0.0061 
0.006136 



0.0063 
0.006311 



0.0156 
0.015642 



may be caused by the sparsity of the data. That is to 
say, A and C has a very few commonly selected objects. 
The sparsity of data set makes the direct similarity less 
accurate, and thus we expect a new measure of similarity 
properly integrating high-order correlations may perform 
better. 

Denoting e a decay factor of similarity transferred by 
a medi-user, a self-consistent definition of transferring 
similarity can be written as: 

t{j s ^ Si v t v j -\- Sij 7 (3) 

V 

where sy is the direct similarity as shown in Eq. (2). The 
parameter e can be considered as the rate of information 
aging by transferring one step further (22j. Clearly, the 
transferring similarity will degenerate to the traditional 
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FIG. 2: Prediction accuracy of the present algorithm, mea- 
sured by MAE and RSME, as functions of e. The transferring 
similarities are directly obtained by Eq. (5). The numeri- 
cal results are averaged over 20 independent runs, each cor- 
responds to a random division with training set containing 
about 90% of data while the probe consisted of the remain 
10%. The error bars denote the standard deviations of the 20 
samples. 



Pearson correlation coefficient when e = 0. Denoting 
S = {sij}NxN and T = {tij}N X N the direct similarity 
matrix and the transferring similarity matrix, Eq. (3) 
can be rewritten in a matrix form, as: 

T = eST + S, (4) 

whose solution is 

T=(1-eS) _1 S. (5) 
Accordingly, the prediction score reads 

4a =v t + I'Y^ Uj (v ja -Vj), (6) 

3 

where multiplier I' — (Y]j tij)^ 1 serves as the normaliz- 
ing factor and j runs over all users having rated object 
o a excluding u, himself. 

To test the algorithmic accuracy, we use a benchmark 
data set, namely MovieLens, which consists of N = 943 
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FIG. 3: Prediction accuracy of the present algorithm, where 
the division of training set and probe is 50% vs. 50%. Other 
conditions are the same as what presented in Fig. 2. 
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users, M — 1682 objects, and 10 5 discrete ratings from 
1 to 5. The sparsity of the rating matrix V is about 6%. 
We first randomly divide this data set into two parts: one 
is the training set, treated as known information, and the 
other is the probe, whose information is not allowed to 
be used for prediction. Then we make a prediction for 
every entry contained in the probe (resetting v' ia = 5 and 
v' ia = 1 in the case of v' ia > 5 and v' ia < 1, respectively), 
and measure the difference between the predicted rating 
v' ia and the actual rating v ia . For evaluating the accuracy 
of recommendations, many different metrics have been 
proposed Q. We choose two commonly used measures: 
root-mean-square error (RMSE) and mean absolute error 
(MAE). They are defined as 

RMSE = /$>< Q -Ui«) 2 /-E, (7a) 

V 

MAE =|E (7b) 

(i,a) 

where the subscript (z, a) runs over all the elements in 
the probe, and E is the number of those elements. 

In Figs. 2-4, we report the numerical results about the 
algorithmic accuracy, where the divisions of training set 
and probe are 90% vs. 10%, 50% vs. 50%, and 10% vs. 
90%, respectively. In every case, there exists an optimal 
value of e, denoted by e op t, corresponding to both the 
lowest MAE and the lowest RMSE. Around the optimal 
value, e opt , the present algorithm obviously outperforms 
the standard CF. The optimal values are different for 
different cases, and the one corresponding to sparser data 
is larger. 

To get some insights about the physical meaning of 
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FIG. 4: Prediction accuracy of the present algorithm, where 
the division of training set and probe is 10% vs. 90%. Other 
conditions are the same as what presented in Fig. 2. 



£opt j we expand Eq. (5) by a power series, as: 

T = S + eS 2 + e 2 S 3 + --- . (8) 

Note that, this formula is also of practical significance 
since to directly inverse (1— eS) takes long time for huge- 
size systems, and the cutoff of Eq. (8), 

T = S + eS 2 + ---+e n S" +1 , (9) 

is usually used as an approximation in the implementa- 
tion (in this paper, since the system size in not too large, 
we always directly use Eq. (5) to obtain the transferring 
similarity matrix). However, even if (1— eS) is inversable, 
Eq. (8) may not be convergent. Actually, Eq. (8) is con- 
vergent if and only if all the eigenvalues of (1 — eS) are 
strictly smaller than 1. The mathematical proof of a 
very similar proposition using Jordan matrix decomposi- 
tion can be found in Ref. [22|. Although Ref. [22| only 
gives the proof of the sufficient condition, the necessary 
condition can be proved in an analogical way. Accord- 
ingly, there exists a critical point of e, before which the 
spectral radius of eS is less than 1 and after which it 
exceeds 1. Since this critical value is also the maximal 
value of e that keeps the convergence of Eq. (8), we de- 
note it by £ max - The optimal and maximal values of e for 
the three cases corresponding to Figs. 2-4 is presented in 
Table 1 . It is very interesting that e op t is always smaller 
yet very close to e max . 

In summary, we designed an improved collaborative 
filtering algorithm based on a newly proposed similar- 
ity measure, namely the transferring similarity. Different 
from the traditional definitions of similarity that con- 
sider the direct correlation only, the transferring similar- 
ity integrates all the high-order (i.e., indirect) correla- 
tions. The numerical testing on a benchmark data set 
has demonstrated the improvement of algorithmic accu- 
racy compared with the standard CF algorithm. Very re- 
cently, Zhou et al. [23j and Liu et al. [2l| proposed some 
modified recommendation algorithms under the frame- 
works of collaborative filtering [2l| and random-walk- 
based recommendations 123] , respectively. By taking into 
account both the direct and the second order correlations, 
their algorithms can remarkably enhance the prediction 
accuracy. These work can be considered as a bridge con- 
necting the nearest-neighborhood-based information fil- 
tering algorithms and the present work. 

Very interestingly, we found that the optimal value of 
e is always smaller yet very close to the maximal value of 
e that guarantees the convergence of power series expan- 
sion of the transferring similarity. The significance of this 
finding is twofold. Firstly, Leicht, Holme and Newman 
[2~i| have recently proposed a new index of node similar- 
ity, which is actually a variant of the well-known Katz 
index [25]. The numerical tests [24| showed that their 
index best reproduces the known correlations between 
nodes when the parameter is very close to its maximal 
value that guarantees the convergence of power series ex- 
pansion. Although their work and the current work orig- 



4 



inate from different motivations and use different test- 
ing methods, the results are surprisingly coincident. De- 
spite the insufficiency of empirical studies and the lack 
of analytical insights, this finding should be of theoreti- 
cal interests. Secondly, e max is equal to the inverse of the 
maximum eigenvalue of S, A~ ax . Therefore, it is easy to 
determine e max since fast algorithms on calculating A max 
for a given matrix is well develo ped (see, for example, the 
power iteration method in Ref. [26J]). When dealing with 
an unknown system, we can first calculate A max , and then 
concentrate the search of e opt on the area around A" 1 ., 
which can save computations in real applications. 

Very recently, a fresh issue is raised to physics com- 
munity, that is, how to predict missing links of complex 
networks [13, ■ The fundamental problem is to deter- 
mine the proximities, or say similarities, between node 
pairs [29l . |30| . The similarity index presented here is not 



only an extension of the Pearson correlation coefficient in 
rating systems, but also easy to be extended to quantify 
the structural similarity of node pair in general networks 
based on any locally defined similarity indices. We be- 
lieve this self-consistent definition of similarity (see Eq. 
(3)) can successfully find its applications in link predic- 
tion problem. 
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